{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Face Recognition Using Siamese Network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will understand the siamese network by building the face recognition model. The objective of our network is to understand whether two faces are similar or dissimilar. We use AT & T's the Database of Faces which can be downloaded from here (https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html)\n",
    "\n",
    "Once you have downloaded and extracted the archive, you can see the folders like s1, s2 up to s40 as shown here:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![title](Images/1.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each of these folders has 10 different images of a single person taken from various angles. For an instance, let us open folder s1. As you can see, there are 10 different images of a single person:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![title](Images/2.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will open and check folder s13,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![title](Images/3.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we know that siamese networks require inputs as a pair along with the label, we have to create our data in such a way. So we will take two images randomly from the same folder and mark it as a genuine pair and we will take a single image from two different folders and mark them as an imposite pair. A sample data is shown in the below figure, as you can notice a genuine pair has images of the same person and imposite pair has images of a different person. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![title](Images/5.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once we have our data as pairs along with their labels, we train our siamese network. From the image pair, we feed one image to the network A and another image to the network B. The role of these two networks is only to extract the feature vectors. So, we use two convolution layers with relu activations for extracting the features. Once we have learned the feature, we feed the resultant feature vector from both of the networks to the energy function which measures the similarity,  we use Euclidean distance as our energy function. So, we train our network by feeding the image pair to learn the semantic similarity between them.  Now, we will see this step by step. \n",
    "\n",
    "\n",
    "First, we will import the required libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "import re\n",
    "import numpy as np\n",
    "from PIL import Image\n",
    "\n",
    "from sklearn.model_selection import train_test_split\n",
    "from keras import backend as K\n",
    "from keras.layers import Activation\n",
    "from keras.layers import Input, Lambda, Dense, Dropout, Convolution2D, MaxPooling2D, Flatten\n",
    "from keras.models import Sequential, Model\n",
    "from keras.optimizers import RMSprop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we define a function for reading our input image. The function read_image takes input as an image and returns the numpy array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def read_image(filename, byteorder='>'):\n",
    "    \n",
    "    #first we read the image, as a raw file to the buffer\n",
    "    with open(filename, 'rb') as f:\n",
    "        buffer = f.read()\n",
    "    \n",
    "    #using regex, we extract the header, width, height and maxval of the image\n",
    "    header, width, height, maxval = re.search(\n",
    "        b\"(^P5\\s(?:\\s*#.*[\\r\\n])*\"\n",
    "        b\"(\\d+)\\s(?:\\s*#.*[\\r\\n])*\"\n",
    "        b\"(\\d+)\\s(?:\\s*#.*[\\r\\n])*\"\n",
    "        b\"(\\d+)\\s(?:\\s*#.*[\\r\\n]\\s)*)\", buffer).groups()\n",
    "    \n",
    "    #then we convert the image to numpy array using np.frombuffer which interprets buffer as one dimensional array\n",
    "    return np.frombuffer(buffer,\n",
    "                            dtype='u1' if int(maxval) < 256 else byteorder+'u2',\n",
    "                            count=int(width)*int(height),\n",
    "                            offset=len(header)\n",
    "                            ).reshape((int(height), int(width)))\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For an example, Let us open one image,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAFwAAABwCAAAAAC26kjJAAAZDUlEQVR4nC2ZS6+u2XWV55hzrrXe\n97vtvc+tLq4qm9hxiBEoISQEQgOJQBo06CAhevkttGgh0YEOiN8AEihEkIQEEWECWCE2DvhSKdtV\ndU6d297f7X3Xmhca5heM1pCe8Qz8Aox5ei1b03Z7M9n9uJ5Wxiptmg9PP/CXb4/vn+6fx/ZJ79d6\nffbxMzd/96Xo9aq4DJlY+Xqedgc6ye3z/fmuvzoN3qcVCvV2ae1EZSy5ffsevBwXqMtmu626L5++\nc5ftJ/uDvb5ZirZyvsyYxqVkjGOss6Sd6eAFl82QyzTuih0j4fFmxxZQCimDAPM8tpdy3Q9zvtkW\nrjdNt88+LbopR2rbV9tnb6a+59rLviyQt9xtaJ9LaycjRLFnNHu10/BCfQny4qpWTJjd2h2dbn1e\nbzdvl25N3zy8f/PD65OXlxf7c3IdPPZeKOuQ0SiFB+/gysADAoU2ztXJZh7zfQ9+vSOGOhHCkU/6\n+fG2g3q89IcnN2PctPvr/MXzP/eaP50XRmMq8zFnmsXqsh0kU5uKTpfnZwnUrBeDDxy3TO2hLTJE\nhiJY1mudbOpsqW9O7YSbybnNKs99z3/6lZf70+64jSpbecOHzcPNsTEHt/lxAmXb7i8PU+Q57Yu7\nq+Zbva66SIKjcEjkmcGau8tyvV7jlT3+8uF6jMv6RV7O0j99XGjZ0U25ma/TYc8TsPPoPD+tMyFO\nbVPvdiyLuAZ9cUZWisa8Cg9loLTrXZQhcZ+bUfSDzf16mR4PkUHRyv07GwqO+Yhq0pzm610lb0mX\nx1vxxa90Ez3Po/pYVkVeLzqnXcupgJHC/mTqb8Z69fUSIfX2ILfvC93fbbbbV3fb+0Z7eeR366Xt\nd7EzrtfN4/3jlsuDS+V5JgPUrg/dUswv26z5bJulk5aVQ+vb/e6VdaEO2U7juOE4Ubt8+QjYvp8e\nHWsqF683INrLlsbQaywPm3kQX0rheyP3eT2P8/Yal4Pl8e54LdDkpDhvPvpuvW/3WWXj97yk7dsO\n6Y9mWmx7ikdR5k3HlLkbdbOJa2+yvZil7xf2KbZFO6nvrmW9oi3seS80igbK1RX/k7ZFWlD4+zdn\n5dsqbTuNrNTkvGmmvNEaLkSe+5bjtGoc6Ox+nC+5ag1AIRlNl2vUsn/ok7dFpet1jmXH113fHyee\nNqq6PdSZtbSSiJSQXovs5ljA7IK7B9zkfB2j9s2lS6wTQqJnrrt7v3GdXh2evPbsqslgDXv2hoz0\nse2nw6XctY2qZkqwJkdyKU32HMJOlSTrWiZt5z5KtHHxGmX/kOvukg+86qH59jIVzjGrjCKE+lpb\ngKe6eRT8jjQRytAkK9JFc9YdGi+bNi6k1NoxbXKaz9dSYzzf+KCy3j70wUdlOzzoGqfZJfTajmXz\nIMd3b47nbLvdvD6ZY7aQ0BTSdJHCcWjYr7xN7rxByUPmtbRe20PIewstfm4VC29fYdidjpI9Xu0i\nVUTeycOP84vFxy33jtvqJV3ESwiSE1uOumnq2or3itTKFLHZ+et1S9eId1/0scfm5Y2+feSv7/C2\n1tf3bEykvLfLF1/fPdSuemy620KoUNXCLEkk0Clo1uhZd7UXBKIXZ0XElmwu64hHp6Xv7eZodtw9\nWhE9A4NIU3M9iXh9cmp2ePx4s0cBt2RpEjWYmFhDqjEjxZBGgCddt6cSul10yJXy3dZyAtaNUdrm\n04OUS10TrPq2DsNoe+rbxzfblgEuhCZCDGIVIolMwPgs1MWWIq62xJXSqRAx+mHjTm5jb/Lk26/q\nw+2tvtlykFLNiNd9quNpD92QhkSJSaIEI70xktJdoUaZmZ7uWH2VJM5g1TGV48yxXm7yuLHPvvS9\nNZ9v2Kpm6nT1VV5tr0JQKFUAjVRJQrMEHIHqhM6IGkFGRBkokUjikrJW221yjZonwlj4tLvnB9uU\nSgENXC1TeRq6KeLJxVmQSgJOBjs4AwyRwZxuFICzM1wiJL2kplAddt2N+80StWrBaZMuGjyNlr7t\n3mNL2SuDKjMzKTGxJwvCKRluNEY6RNKRlChOFMMiDBkBHKBNcmo81VnOxaUTL9a91S2tuyFck1NC\nIEHErFmJRDg5s0phbaRM4gmwu0kwiSQ82JNGKm1RfJUsUi+KTHDatd2aYD/PTcApqcKkhRDJKJ7M\nLEquIZwE4VQFIBSdQkAcklwiEknnHeeyttPwd+CLBDMUD4OnVRxIC63KBCQzEglCKkh5FmKwIDwy\nncKIBBHGBGhXCoYOPbfxrHF2n+beIvg0bXnht9gcO2WwUBLUiYUyANek8AgaHplMCQskJ4E0kxyR\nAMsApUULuZTr0yfN5bOYVwEz3fUSj6ljlkxiFWEpgshgBidBCznCg+CJSEGyA+FK7MlKlJkAi4ma\nX6xfx136A4RJhz7QjWsNDG5BQoJASVDXHEgOTypckxHJiCxJQQB7MhEjmDgkktySwWVzste+bF9H\nyGBecGoF63VbJqlbVUTRRIQjFVw4keqUSglABAwqQQkFMVO2YICUU5LXUja5vV6zL6UIE7GWgbMt\nQJXNXW1oFKAAkZOLIKUINdZkFXVLIqEi7E4AlNUEkfAhWcZE1/TzmmHJGBDwPqwIdFjXWEklW4iS\nRNRIH0tmgHgKUHdmZUSKEVgDHk7MyYoEm+btmi13E5MGiCai4FIej9O00VZFqwo4yJ0I6lWYm0PA\nCAF2hadkkXTOsEzWQhouCCkGdrG7oHJZ7jZLhTF5KjfrYcubVjcysbAQhBgQ0gSBKoQJwkk2wcuq\nU8AgNTncEWBypURwcl12qXx8xXmIiUloMC9m03XDq2CujTORHKCUpCAKlEzOLHNBJKNRiOcgB4UY\ncSo4kIUhUdGg7d1q/JBkSgAvYvW9PJ6WrCREHpFOEICZ4MgkZh6RNFFejSgBwJJSUjIpnOGURAHa\ndfT7D+MRW3ZbAsm2Pz/emdcJtN9rGoU7hWeyFAaCksmEBmA6MUVYgszIR2YSA541OZlo6FTW3Y/8\nuOFgATz5TNObZbT20L3oSB5kGY7ISCSSHZykgzPABEktNMjp6mbRM4yAVINJwHZ83h/HZSkTdkHE\njKWdjhtvNG53jJHDWZgHEboFCEzmIPjoNJwCfSQ50n+a3gMaJAKkOEmdlhWr9FpjMhUVl/Xl1pdu\nLB5ZksRCNJMpMwWUSbFosgeFe1ImqZuEkFOEp8taSI0CBFQJaFQXz4hUYcv57s1poWlQejbEsUwk\nphhqKVbYKMmZeESNBIiIJdXCxSJhYmujeUTCY//pUrb3twurJ6BW54e72c6bqyYZg3M93oYtbaEh\n8DF5BvfGJpbpAQ0ydVt7XktKNWlXHsbCbBK7H13jft9y6lRJTNl98+qD8/FuIacQDb+IL9pPPjL0\nZrUdyaB1IitZQtDFeFxen1fibOtOclP9QgipPNph/2cr6N3PtZnpKsq+OQ7qUYidA9mpCFn2U1hO\n+XYzVqGkKVLAZgVYO4/VyK9Zoket3pl7502HxDRR5C4Py+bkyaLpRXK1SursEit7nU+m0YelyYzg\nGkAGh5Be1Sl9SU7Zzdc2Vb2UWXMaMdYkYBvv/t8NjtDXX/5YkcoiZbxsR59IKMyYo2yvrE9Jr0Fd\nS+NoddkwGRhGmqWT3jhRaGHZoU4V18t1GZWZt4c7v0ivdN2ehJT4+uwzXY/bNAqP2Gar20N2t2m/\nBFFlnqrOhbsh4AKlfRfq3QZ60a0MLjIHhXF4fvCjqVwi889T3Aspffj4s+dyzyHqMJ35MM/mI9v5\nen3bqe1nhDIKYqQghjGE/fz6+HZL3376qtbpy4fbu2o0u3J/tP/Jgd4++uu/+Pbh33xh+msf2Zfq\nm9zkDQNbqnLYL3Q5vf3iR+1y8mdxePRoBTYjHMYUujB4vP1J0nt/4cWLn/2Vf/mVH3z5R9/6xoel\n9eIaeaAx3vmlj1p79vf/GeuvdLqVb6KuEwtPtm10efnjj/ef0Ke/+e+e/uQvvXj54vSB66UG0aji\ngxZOuVz1yX635ncOm2/0rx/+/beOXxMoOGT/eHn8M1+7GRf7+i/9Z92su7vt5r8/+rN9kVJFZf3i\nW/0rP9/+6NE//9pay8/mp4cv3j/6pGoGKZdYvVzO773dlut7H73RJ3/j2bsP79XjZx9KIWJ/Itsn\nX9/oi89+7vv/8D/pnz57/dVpmr74ZAYXma/ls08//+UPdn+zft85PjqMO54+/tL+RLMs7q3xg/Vx\n7pEvDl5//ZtfPPnhzzx/Pj8ea99Ucvhhr+/f1ud+x8fDL+glHv3kw0O7+/aWiwpvj73+4sNP/s75\n5/7R09i99/r1gexmCSvsGet2PcRlPY13uJzK5fuPN+/evbZro0aj70QpB76WHx6+vX+sL+bv/qau\ng/ff/JVt+SVPgGp5iLbnH/6rv/ziZ46H8NiN96Jc11kcJoyLlulkiz2rtdf2IkRSv3SWZV48jSU1\nWzvcv3lHdPfy5V/RKzK/+vt/mw5rcILHwK5+IP5HiGku/cqFMyQkkfAALNAcp9dzq34qUm50u22n\nXe6DiIOJZDt/J9ZJN9f1uXK5bEw/fZ8grHldfad1+sZ58XD0dUL2taiIgLSDeWXZesm+5QSJUJmq\nT4OIoGqcelVap6cbGXPyx3qrh7L7hgWoEFsfUJ42u9uHsy2DN6P0BZPWohzO/QEGTfHb86W0NOG0\n6OwaSIYxMZKm//ELu1yJnnxy0qdbzU355gsVSs9IFWSOoDIgATeqMrMIig/2KwVA1flgV4gyOK4x\nR3N1cQkeoPL5dz/56ofa5IMfie5wGj/+ePnOr0lQMlMa53kdToZQYNCmUGNhzxAQryDeRMQkJszs\nKWRs4gwhSQoqa/nK8XtP99jLIxU7vjyOZ8vDTJmARIhJH+KgTZesrEhlikxJ5UpBxLqwonpIgRAh\nwYnUYCcKe+dP7javH6hAn+m8Lq/zqTxzRIA4OZFrQToTShehwoYUuAsnsnVYKR3EUi09qpOmQbMk\ns2ci5KN+uMZg31SGQfT987stggBXaCYGc2MaDq3CXFgzUFjhMYg1S3NuLE1DLMWhRAJQCrvD3/+i\nS1Htjx7UmUnf2tPPnRGIFqPQqMY2kvinzKgkIyl8qKaGM0g3nglhGBhMLIHgRCDS2Wy1ur29v/0T\nPd3GZonl5s3VhLjpaKRkpWfllUsWyg3FYsSrFaoERRDFxjHgAiHW5CRhI8mkTPSFh395r9dn39N1\nr/JnYznvFkcyNNSQSkqITXBwzlLePICqXXYKNvaUuGwLIiU1spAEJIjgNBCFHzZ/cfRnuL998X/4\n3c9vPvrq7Vd/d8vmaAIiDeKUwoAw0LSN+3XpJgcpRGkeptPaWFpn04klwQTPkZROGaNuHt3QNQ5/\n/JK39snY1A8PP0RxYSWGMRMplaqJKhA7DkthQGeQhGTkLgZvdBZlBJDIDGE37ek5PQrCm+Ptj3+/\n8H/8Mj5+2E2/+JaQSUwlE5UbSQ5iYYatRrW2FGEnH+qM8LKQUK1JIkQsGUQiEQju25I2xp5/7wvw\nv3j7bpN9Pfz8yp6mSaUyFSEe6QKCmyV012oTJxinCMzJLIgAdtKizFDJTFAKomb2ufzBH26Z7/7J\ntU4GkmBErBosXCGMLBwZQe4+TXWa26Y8emeKJA9EJwthnUQpqQiUiIJBaYHLNcbyu78jGVp+/I//\nwTv1NLyGBTKFrbhyQNoKhHNAeVNo4IB7701e9Um4D6ogJ4aEgDg5iC0ikbe19E++9fkbpeBy++k/\n/S9lI/NtuKdngItUCqgLexpl0OauHK9nfkJn0uW6nOeGESAQu3ApJFUERERBY1vK6Vv/4X9fagap\nXrfxb8ffuimPv7nzMXNGQ3ENhzhlwphOB/v+D5598Pzp5Pnienz547/b4u1H0MEkRBpAkgQ5fPAo\nn734wef34kSp2kuG/vYf/9ov35X0DIUABkOJ4EAws5z+17P13WdLeVtpROBpfu/Jp/smkGSEDGIE\nETlihNBvHeMSRtXYTSmX6vPzf/0H23d+VchDYTUoJQtFCjFrvPvpQ9uwnsHwo7Xx6Pr68KSCEmzV\nJRTlOl2JmFBefawhxhkUQjpQnEYZb99896+59ImgXpKQIQNUfVD6dlZwEYIfaS7IeaapJkCUJBL/\n36ZxWJRPli27IJLJlYWNU1KQ91+gy08HFQqJuFMS8yWMulFAt+xyU5hEsmQKicItQ0cQe1HKkNNn\n6smUHhkMFpdEgODb/1pcCERKkgKQsESQDHDagmFrBMN6JlVJygS4MUUQUY6ksJyfv5CSPSQ1EM7B\nmbCktd39H7gFAYBGBhgk5kaemcbp6RG6BAHMzj1BBFF1qkKDCeZDf+AU1DyILDRZkgTihMDnb3Mk\nkTBIQAS10WMEpE2zeHglkkKehzLo6iPDgpAamQFEUpbzJ9AkawtC4INd2XMwsXP9Dsk1MhAkhSg8\nsyRIyjyxRo7TJQJFoNKH9G6dxiBiBjrAOVyfvwkg1AleTUSRBC/WIu3mB3/ViykFgCQmF1MnZXB4\n+hJDuUnUoqx9WstSTQYixZCZFknyo2v1gHEZJUqQSk+SKJSk87e7gpJciJJNbbRoYgH4SCH3rj5F\nURfacNF0BK8lOZhg0RHrJ8KjDqohMWoxdiqaQc6y8vpDy0gi8kQmaK61qErVUgTbqUwyFwETqkhh\nERBdnTtSQI6Ynr/wSGZ2MtIwMJf0QuDhQptvliS27CAn5lpIUtrMfFlPY+kiuJyWy5lAhUUgzCQ9\nKcM9QUZfPDRuqxOPypEEDSYa1eqqgf233j7pvAtOSkKUFWcr5fr6SiO4rKYuAhvHOm+bdICJdThJ\nlz4wFjnDOdqi3rqwUygDDOfBkODtf/sNllGAYHK1vr59zXo5RV9OcpOZbV7RCVHr4y3HNG3yp23q\n6eCVB9TLMi+kRA5iJVNaqzGCmHe//euaNFpQisR6vP/81BuNNRPl5HalaQ1Yq5EvSp3vbn2flBkO\nJ5yvGwtiZxcakuBgtSJZACtpBJff+3XipJQg5Ah/52Y1E5draI9YqtNFV2mdSi1VaN2AMTiiU18d\nkEBIkDUTD00Ful63q1jJKDr/1q9WVCeBJFj3k4zz2UDNESMa315Xj2xDeZ50O3UjJbHsgesiAGUK\nkcnQXjJSNWQ0Z3UvEc79d/5eTStDiWyORo1vz8clm2cQ9R61uEkD5laazTPVNHe3yKszqlGDT5Qs\nkhRFUzuLl2QLHjqe/u6va0UgAc1DnIla2Z1HLKsnFfEI1TZTq0WjigQb5ejZ41pNKYUC7LNbvWwG\nKZKSczBC4V48Pm2TZImkdpXillZyY7L2kdbLyBAWzSrFlYBBSWEj4gGMokj06SwebDUZSpmyzANZ\nI9pw9fG6TDU44bWXSkHOIdTqmtGbsyPMpTCg6RGevlj6sjaSFFHXpEzqGvOpaBBaZ3BQyc5rmbg8\nJNNkakjT8DqkGDpgkaYoQ1KpphBHBFle3UZfsvAok9UooR5UkrskM6Q352QLNJpofVOWJYaZ09K7\nAxCSKhzJPc0ylUppERm0jNF79LTRr3MNJW+eIyNYTIMm4xAKHiIEwFmiwuty8WGDs2dmWjqxFk6w\nauT46ZlWnC4EDVxihK/SGKK8pgalmCAZDAVxejExaGceaUOHPiRySihAnBEISkKwiQcHrASJkQmc\nritZrOsNWwElIQWguoIN4ZpkmhEgOPUixFaRfMW0eJNOHMwZMZQok4wjQHqtVoI5IkaQ0+W0UREX\ntRBONXhqZ/KJQaAhkaJJKlFkZPE5j1eKMInhq6UHDbMe4bY6DYcVB5P1bjHG0nVbBUVMQcZeScgU\ngGsGxXyeHa4BTr5oyhyHl/dRbdPbopLhC2WM8PShPQenehnpERRXsxWTsm8j1TxhCJiyZwlSyGCr\nPKg65ms9hwKV/e7VgNNMxlkIvCaQMFLTESvCSTwR4zLclTaFFVirMNk8PHMy6CipnPDSObizGuE0\nahIV+NNXIZ1JuNegDS1MzOYERIyp8uAgt8uFvKI0rkwoYcji6fPZvXkZYHdJ51SmQG/HXmYX8CTT\nbiz9zQXZoBJ7DWfixpqZbaqYiC6+2hIVm8ZTIURoJqczejV1JRZ2GkLEI1MHr2fpOxoamLDbT76c\nehKhCpoHs7JSkTZVIo7e748rVW21Vm0SaTYcgCOYRZBWFMVcI5qD2cJXvgFBIXLZdH3oRsUKU+jO\nDBrQaFAMHnmih1E2hWvBzCaOYMUyWU0aasUhprIygV2CHWuC50LqSkAboLvjFQfpLQmVnaplwAPr\nEB/LpU/bJJ1mUYQF8eBlZnYqJjSma4Nau84GQQYvmTFCCOJSUstV7nfj3GsZNS2bOkh4jbGYnLFe\n/MAoJLUwGGIemmHgSC+hHkqmxVpX5+DAoisxXn2QoeRC6nNd9nU5s0ys3YHuPMjToo/urSGVapXi\n8EQmxYVzrV5gnGoI8KhJJGaMNYkIu29XkNEQoZn2M7XNO/t4OK1LjNVs2FjCl3W07RYKLlayUDKB\nIu0+qRd2IrgCgFZn4yC2+hpEhEd/+BviOuooVI12wxq09XFZyZPUY4Qbq5AYwNr3TZdEsLHXt280\ninmRzLYwJ/0/S7JsNfhWRLwAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<PIL.PpmImagePlugin.PpmImageFile image mode=L size=92x112 at 0x5790C50>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image.open(\"data/orl_faces/s1/1.pgm\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we feed this image to our read_image function, it will return as the numpy array,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "img = read_image('data/orl_faces/s1/1.pgm')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(112, 92)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "img.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we define another function get_data for generating our data. As we know, for the Siamese network, data should be in the form of pairs (genuine and imposite) with a binary label.\n",
    "\n",
    "First, we read the images (img1, img2) from the same directory and store them in the x_genuine_pair array and assign y_genuine to 1. Next, we read the images (img1, img2) from the different directory and store them in the x_imposite pair and assign y_imposite to 0.\n",
    "\n",
    "Finally, we concatenate both x_genuine_pair, x_imposite to X and y_genuine, y_imposite to Y:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "size = 2\n",
    "total_sample_size = 10000\n",
    "\n",
    "\n",
    "def get_data(size, total_sample_size):\n",
    "    #read the image\n",
    "    image = read_image('data/orl_faces/s' + str(1) + '/' + str(1) + '.pgm', 'rw+')\n",
    "    #reduce the size\n",
    "    image = image[::size, ::size]\n",
    "    #get the new size\n",
    "    dim1 = image.shape[0]\n",
    "    dim2 = image.shape[1]\n",
    "\n",
    "    count = 0\n",
    "    \n",
    "    #initialize the numpy array with the shape of [total_sample, no_of_pairs, dim1, dim2]\n",
    "    x_geuine_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2])  # 2 is for pairs\n",
    "    y_genuine = np.zeros([total_sample_size, 1])\n",
    "    \n",
    "    for i in range(40):\n",
    "        for j in range(int(total_sample_size/40)):\n",
    "            ind1 = 0\n",
    "            ind2 = 0\n",
    "            \n",
    "            #read images from same directory (genuine pair)\n",
    "            while ind1 == ind2:\n",
    "                ind1 = np.random.randint(10)\n",
    "                ind2 = np.random.randint(10)\n",
    "            \n",
    "            # read the two images\n",
    "            img1 = read_image('data/orl_faces/s' + str(i+1) + '/' + str(ind1 + 1) + '.pgm', 'rw+')\n",
    "            img2 = read_image('data/orl_faces/s' + str(i+1) + '/' + str(ind2 + 1) + '.pgm', 'rw+')\n",
    "            \n",
    "            #reduce the size\n",
    "            img1 = img1[::size, ::size]\n",
    "            img2 = img2[::size, ::size]\n",
    "            \n",
    "            #store the images to the initialized numpy array\n",
    "            x_geuine_pair[count, 0, 0, :, :] = img1\n",
    "            x_geuine_pair[count, 1, 0, :, :] = img2\n",
    "            \n",
    "            #as we are drawing images from the same directory we assign label as 1. (genuine pair)\n",
    "            y_genuine[count] = 1\n",
    "            count += 1\n",
    "\n",
    "    count = 0\n",
    "    x_imposite_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2])\n",
    "    y_imposite = np.zeros([total_sample_size, 1])\n",
    "    \n",
    "    for i in range(int(total_sample_size/10)):\n",
    "        for j in range(10):\n",
    "            \n",
    "            #read images from different directory (imposite pair)\n",
    "            while True:\n",
    "                ind1 = np.random.randint(40)\n",
    "                ind2 = np.random.randint(40)\n",
    "                if ind1 != ind2:\n",
    "                    break\n",
    "                    \n",
    "            img1 = read_image('data/orl_faces/s' + str(ind1+1) + '/' + str(j + 1) + '.pgm', 'rw+')\n",
    "            img2 = read_image('data/orl_faces/s' + str(ind2+1) + '/' + str(j + 1) + '.pgm', 'rw+')\n",
    "\n",
    "            img1 = img1[::size, ::size]\n",
    "            img2 = img2[::size, ::size]\n",
    "\n",
    "            x_imposite_pair[count, 0, 0, :, :] = img1\n",
    "            x_imposite_pair[count, 1, 0, :, :] = img2\n",
    "            #as we are drawing images from the different directory we assign label as 0. (imposite pair)\n",
    "            y_imposite[count] = 0\n",
    "            count += 1\n",
    "            \n",
    "    #now, concatenate, genuine pairs and imposite pair to get the whole data\n",
    "    X = np.concatenate([x_geuine_pair, x_imposite_pair], axis=0)/255\n",
    "    Y = np.concatenate([y_genuine, y_imposite], axis=0)\n",
    "\n",
    "    return X, Y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we generate our data and check our data size. As you can see we have 20,000 data points, out of these 10,000 are genuine pairs and 10,000 are imposite pairs. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X, Y = get_data(size, total_sample_size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(20000, 2, 1, 56, 46)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(20000, 1)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we split our data for training and testing with 75% training and 25% testing proportions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=.25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that, we have successfully generated our data, we build our siamese network. First, we define the base network which is basically a convolutional network used for feature extraction. We build two convolutional layers with rectified linear unit (ReLU) activations and max pooling followed by flat layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def build_base_network(input_shape):\n",
    "    \n",
    "    seq = Sequential()\n",
    "    \n",
    "    nb_filter = [6, 12]\n",
    "    kernel_size = 3\n",
    "    \n",
    "    \n",
    "    #convolutional layer 1\n",
    "    seq.add(Convolution2D(nb_filter[0], kernel_size, kernel_size, input_shape=input_shape,\n",
    "                          border_mode='valid', dim_ordering='th'))\n",
    "    seq.add(Activation('relu'))\n",
    "    seq.add(MaxPooling2D(pool_size=(2, 2)))  \n",
    "    seq.add(Dropout(.25))\n",
    "    \n",
    "    #convolutional layer 2\n",
    "    seq.add(Convolution2D(nb_filter[1], kernel_size, kernel_size, border_mode='valid', dim_ordering='th'))\n",
    "    seq.add(Activation('relu'))\n",
    "    seq.add(MaxPooling2D(pool_size=(2, 2), dim_ordering='th')) \n",
    "    seq.add(Dropout(.25))\n",
    "\n",
    "    #flatten \n",
    "    seq.add(Flatten())\n",
    "    seq.add(Dense(128, activation='relu'))\n",
    "    seq.add(Dropout(0.1))\n",
    "    seq.add(Dense(50, activation='relu'))\n",
    "    return seq\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we feed the image pair, to the base network, which will return the embeddings that is, feature vectors:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "input_dim = x_train.shape[2:]\n",
    "img_a = Input(shape=input_dim)\n",
    "img_b = Input(shape=input_dim)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\dineshp\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\ipykernel_launcher.py:11: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(6, (3, 3), input_shape=(1, 56, 46..., padding=\"valid\", data_format=\"channels_first\")`\n",
      "  # This is added back by InteractiveShellApp.init_path()\n",
      "C:\\Users\\dineshp\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\ipykernel_launcher.py:17: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(12, (3, 3), padding=\"valid\", data_format=\"channels_first\")`\n",
      "C:\\Users\\dineshp\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\ipykernel_launcher.py:19: UserWarning: Update your `MaxPooling2D` call to the Keras 2 API: `MaxPooling2D(pool_size=(2, 2), data_format=\"channels_first\")`\n"
     ]
    }
   ],
   "source": [
    "base_network = build_base_network(input_dim)\n",
    "feat_vecs_a = base_network(img_a)\n",
    "feat_vecs_b = base_network(img_b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These feat_vecs_a and feat_vecs_b are the feature vectors of our image pair. Next, we feed this feature vectors to the energy function to compute the distance between them, we use Euclidean distance as our energy function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def euclidean_distance(vects):\n",
    "    x, y = vects\n",
    "    return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))\n",
    "\n",
    "\n",
    "def eucl_dist_output_shape(shapes):\n",
    "    shape1, shape2 = shapes\n",
    "    return (shape1[0], 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "distance = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([feat_vecs_a, feat_vecs_b])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Now, we set the epoch length to 13 and we use RMS prop for optimization and define our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "epochs = 13\n",
    "rms = RMSprop()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\dineshp\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\ipykernel_launcher.py:1: UserWarning: Update your `Model` call to the Keras 2 API: `Model(inputs=[<tf.Tenso..., outputs=Tensor(\"la...)`\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    }
   ],
   "source": [
    "model = Model(input=[img_a, img_b], output=distance)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we define our loss function as contrastive_loss function and compile the model. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def contrastive_loss(y_true, y_pred):\n",
    "    margin = 1\n",
    "    return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model.compile(loss=contrastive_loss, optimizer=rms)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "img_1 = x_train[:, 0]\n",
    "img2 = x_train[:, 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\dineshp\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\ipykernel_launcher.py:2: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.\n",
      "  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 11250 samples, validate on 3750 samples\n",
      "Epoch 1/13\n",
      " - 60s - loss: 0.1650 - val_loss: 0.1287\n",
      "Epoch 2/13\n",
      " - 58s - loss: 0.0897 - val_loss: 0.0932\n",
      "Epoch 3/13\n",
      " - 58s - loss: 0.0666 - val_loss: 0.0564\n",
      "Epoch 4/13\n",
      " - 59s - loss: 0.0545 - val_loss: 0.0554\n",
      "Epoch 5/13\n",
      " - 59s - loss: 0.0449 - val_loss: 0.0291\n",
      "Epoch 6/13\n",
      " - 58s - loss: 0.0399 - val_loss: 0.0253\n",
      "Epoch 7/13\n",
      " - 58s - loss: 0.0360 - val_loss: 0.0273\n",
      "Epoch 8/13\n"
     ]
    }
   ],
   "source": [
    "model.fit([img_1, img2], y_train, validation_split=.25,\n",
    "          batch_size=128, verbose=2, nb_epoch=epochs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we make predictions with test data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "pred = model.predict([x_test[:, 0], x_test[:, 1]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def compute_accuracy(predictions, labels):\n",
    "    return labels[predictions.ravel() < 0.5].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we check our model accuracy. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "compute_accuracy(pred, y_test)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
